1 Introduction

write introduction here



2 Observed vs True data

In this section we will compare the observed with the true dataset.

Table 2.1: Observed Data
age smoke sex intensity active rest height weight bmi
42 no female high NA 75 NA NA 22.4
31 NA male low NA 62 NA NA 23.8
36 no male low 109 76 182 78.0 23.5
31 no female low 78 62 164 53.9 20.0
42 no male low NA 66 189 NA 23.4
Table 2.1: True Data
age smoke sex intensity active rest height weight bmi
42 no female high 94 75 161 58.1 22.4
31 no male low 86 62 184 80.6 23.8
36 no male low 109 76 182 78.0 23.5
31 no female low 78 62 164 53.9 20.0
42 no male low 103 66 189 83.6 23.4

2.1 Descriptives

Obviously, neither the mean nor the variance of age, and rest changed since these has no missing values.

The mean of active is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 40% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of active.

The mean of height is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 30% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of height.

The mean of weight is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 57% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of weight.

The mean of bmi is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 30% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of bmi.

Table 2.2: Means and variances in true and observed dataset
Variables \(M_{obs}\) \(M_{true}\) var obs var true
Age 38.52 38.52 149.73 149.73
Active 92.58 93.13 383.05 383.04
Rest 69.83 69.83 120.78 120.78
Height 174.50 173.99 100.66 105.29
Weight 73.91 73.58 260.26 274.85
Bmi 24.11 24.06 12.91 13.38
Note.
obs = Observed Dataset, true = True Dataset

Over here categorical data descriptions

2.2 Correlations

Table 2.3: Correlations of obsereved data
age smoke sex intensity active rest height weight bmi
age 1.0000000 0.0050124 -0.1700221 0.2109746 -0.4914321 -0.3883588 0.1937835 0.2486319 0.1766626
smoke 0.0050124 1.0000000 -0.0871943 -0.2907560 0.1493831 0.2325334 0.1780112 0.1772761 0.1807098
sex -0.1700221 -0.0871943 1.0000000 -0.0890488 0.1111608 0.0615947 -0.7305737 -0.6804187 -0.4179512
intensity 0.2109746 -0.2907560 -0.0890488 1.0000000 -0.3746708 -0.5493098 0.1327198 0.1195032 0.0170766
active -0.4914321 0.1493831 0.1111608 -0.3746708 1.0000000 0.5595138 -0.0021729 0.0130248 0.0543435
rest -0.3883588 0.2325334 0.0615947 -0.5493098 0.5595138 1.0000000 -0.1974799 -0.1214278 0.0614009
height 0.1937835 0.1780112 -0.7305737 0.1327198 -0.0021729 -0.1974799 1.0000000 0.7769770 0.3380449
weight 0.2486319 0.1772761 -0.6804187 0.1195032 0.0130248 -0.1214278 0.7769770 1.0000000 0.8761512
bmi 0.1766626 0.1807098 -0.4179512 0.0170766 0.0543435 0.0614009 0.3380449 0.8761512 1.0000000

2.3 Regression

Table 2.4: Regression analysis of True and Observed Data
\(\beta_{obs}\) \(SE_{obs}\) \(p_{obs}\) \(\beta_{true}\) \(SE_{true}\) \(p_{true}\)
(Intercept) 78.444 14.34 0.000 80.384 9.03 0.000
age -0.809 0.11 0.000 -0.883 0.07 0.000
bmi 1.681 0.55 0.003 1.776 0.35 0.000
sexfemale 32.756 20.78 0.117 43.460 14.16 0.002
smokeyes 1.615 2.91 0.580 3.516 1.99 0.078
bmi:sexfemale -1.131 0.88 0.199 -1.674 0.60 0.006

3 Missingness

There are 540 missing values. 0 for age, 0 for sex, 0 for intensity, 0 for rest, 58 for smoke, 92 for height, 93 for bmi, 123 for active, and 174 for weight. moreover there are 132 completely observed rows, 15 rows with one missing value, 37 rows with two missing values, 52 rows with three missing values, 55 rows with four missing values, 15 rows with five missing values.

pattern of the missingnesspattern of the missingnesspattern of the missingnesspattern of the missingness

Figure 3.1: pattern of the missingness




3.1 Looking for the missingness

comparing the distribution of the observed and true dataset

Figure 3.2: comparing the distribution of the observed and true dataset

3.2 Missingness of weight

Looking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MAR

Figure 3.3: Looking whether the missingness of weight is MAR

3.3 Missingness of height

Looking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MAR

Figure 3.4: Looking whether the missingness of height is MAR

3.4 Missingness of Active

Looking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MAR

Figure 3.5: Looking whether the missingness of active is MAR

3.5 Missingness of Bmi

Looking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MAR

Figure 3.6: Looking whether the missingness of bmi is MAR

3.6 Missingness of Smoke

Looking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MAR

Figure 3.7: Looking whether the missingness of smoking is MAR